Utilize Probabilistic Topic Models to Enrich Knowledge Bases

نویسندگان

  • Laura Dietz
  • Avaré Stewart
چکیده

In publication driven domains such as the scienti c community the availability of topic information in the form of a taxonomy and associated publications is essential. State-of-the-art methods for topic extraction in the Semantic Web community either need high manual effort (e.g. when using categorization) or rely on error prone techniques such as hierarchical clustering. We present an alternative solution that uses probabilistic topic models, a technique for unsupervised topic extraction based on statistical inference. The topic model can autonomously perform tasks that require massive data processing; such as identifying topics and associations of publications to multiple topics. Only for tasks requiring intellectual activity and for which no reliable automated techniques are available, is the user is asked for assistance. In this work we explicate how the results of the topic model are stored in a knowledge base for later reuse. It is described how the stored information can be interpreted to provide diagnostic support for the manual topic re nement. We deliniate how the extracted topic information can be exploited in an community service application for the end user.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Thesaurus Knowledge and Probabilistic Topic Models

In this paper we present the approach of introducing thesaurus knowledge into probabilistic topic models. The main idea of the approach is based on the assumption that the frequencies of semantically related words and phrases, which are met in the same texts, should be enhanced: this action leads to their larger contribution into topics found in these texts. We have conducted experiments with s...

متن کامل

Tractable Markov Logic

Tractable subsets of first-order logic are a central topic in AI research. Several of these formalisms have been used as the basis for firstorder probabilistic languages. However, these are intractable, losing the original motivation. Here we propose the first non-trivially tractable first-order probabilistic language. It is a subset of Markov logic, and uses probabilistic class and part hierar...

متن کامل

Grounding Topic Models with Knowledge Bases

Topic models represent latent topics as probability distributions over words which can be hard to interpret due to the lack of grounded semantics. In this paper, we propose a structured topic representation based on an entity taxonomy from a knowledge base. A probabilistic model is developed to infer both hidden topics and entities from text corpora. Each topic is equipped with a random walk ov...

متن کامل

Finding Answers to Definition Questions

Current researches on Question Answering concern more complex questions than factoid ones. Since definition questions are investigated by many researches, how to acquire accurate answers still becomes a core problem for definition QA. Although some systems use web knowledge bases to improve answer acquisition, we propose an approach that leverage them in an effective way. After summarizing defi...

متن کامل

Most Probable Explanations for Probabilistic Database Queries (Extended Abstract)

Probabilistic databases (PDBs) have been widely studied in the literature, as they form the foundations of large-scale probabilistic knowledge bases like NELL and Google’s Knowledge Vault. In particular, probabilistic query evaluation has been investigated intensively as a central inference mechanism. However, despite its power, query evaluation alone cannot extract all the relevant information...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006